9 - Deep Learning [ID:11763]
50 von 620 angezeigt

Okay, so welcome back everybody and welcome to our lecture on deep reinforcement learning.

So today we want to look a bit into game theory and actually strategies how to learn playing

games.

So we actually divide this topic into several blocks.

Before we really go into the concept of reinforcement

learning, we first look into the process of the sequential

decision making.

So this is not yet really reinforcement learning, but

it's one step towards the actual game problem, where we

look essentially into state free systems and these systems, they can just do actions and

you can choose which action produces probably the highest

reward.

And based on that, we will go ahead and expand the system with an additional state and then

we are really in mark of decision processes and go

towards reinforcement learning.

And we conclude the lecture by the topic of deep reinforcement learning, where we then

really apply the things that we've learned so far in deep learning to that particular

topic.

Okay, so let's start and we start with the concept of, as I said, sequential decision

making.

And here we essentially have a problem that is also referred to as the multi-armed bandit

problem.

So you can choose from several actions.

You have like an action that you can choose to a time t and there is a set of actions,

capital A. So you could say, for example, if you have these

armed bandits in the casino, then your action is you choose one of the machines and pull

the lever and then you get some reward.

But you don't know about the state and all of them essentially behave the same.

So you just want to choose a specific of those bandits.

So then you take an action and the action at time t then has a different and unknown

probability density function that generates some reward r.

So r will be the rewards and what we want to get, of course, is the maximum reward.

We want to win money or we want to get a high score or something like that.

Most reward is also very, so in this case it's very simple because I pull and I get

immediately a reward.

Later in the games it will be much more complicated because the reward may be generated only at

a very distant future.

If you consider playing chess, for example, then you get the reward at the very end only

when you win the game.

So that's much more complicated.

Here we get an immediate reward, we pull and then each of those single-armed bandits will

then generate some reward and depending on which machine you've chosen you get a different

one and maybe they have a different probability distribution, right?

Okay, so then because we now have actions and we have rewards, we somehow have to choose

an action.

And we can, the choosing of an action we can formalize as a policy.

So these are the important concepts.

They will also be important for our reinforcement learning.

So we choose an action and the action generates a reward and we choose the action by some

policy and the policy we denote here with pi and we can also model this as a kind of

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:11:46 Min

Aufnahmedatum

2019-07-04

Hochgeladen am

2019-07-04 18:09:02

Sprache

en-US

Tags

tasks evaluation decision networks learning function network reinforcement random policy action selection deterministic
Einbetten
Wordpress FAU Plugin
iFrame
Teilen